Device for data analysis and organization

ABSTRACT

The invention relates to a data processing device comprising a database management system ( 470 ). The management system ( 470 ) can cooperate with an operating system so as to allow the user the creation/input and/or the use of a database comprising at least one data table ( 475 ) which can be broken down into rows and columns. According to the invention, the device furthermore comprises: an autonomous meta-dictionary ( 510 ) for dynamically storing chosen information relating to the structure of each table of the database, and to the relationships between tables, an analysis means ( 530 ) capable of determining and of storing at least temporarily a representation of groups of interrelated columns, and a restructuring module ( 580, 590 ) capable of cooperating with the analysis means and the meta-dictionary with a view to establishing for the user a presentation of the database which takes account of at least one group of thus related columns.

The present invention relates to systems for processing and storinginformation of the database type.

Such systems can be rendered accessible to novice users, with the aid ofappropriate development tools. These tools are very often used byprofessionals (persons skilled in the art), since they considerably cutthe costs of developing applications geared to a database.

Additionally, it is known that the quality of an application is greatlyconstrained by the quality of the initial study of the requirements tobe satisfied, and consequently by the budget allotted thereto. Now,owing to their “mechanical” formulation, the modules obtained with theaid of the aforesaid tools are generally inflexible. In fact, theirflexibility is aimed more toward the tailoring of the user interfacethan to any variations in the more fundamental requirements.

For these reasons, and also as a function of natural changes in therequirements, it is frequently necessary to modify such an applicationsubsequently. Such a later modification is a very unwieldy task, themagnitude of which grows much faster than the complexity of the relevantapplication. At present, it is practically inaccessible tonon-specialists, whereas, done by specialists, it entails costs whichgrow rapidly, to the point that it is often less expensive to redoeverything each time. This situation, which will be illustratedhereinafter in a very simple example, is obviously unsatisfactory.

The present invention has in particular the aim of affording a solutionto this problem.

The data processing device proposed for this purpose is of the typecomprising at least one computer, furnished with a central unit with aprocessor, at least one user peripheral, and a memory, which are run byan operating system, as well as a database management system stored inthis computer, and able to cooperate with the operating system so as toallow the user the creation/input and/or the use of a databasecomprising, for example, at least one data table which can be brokendown into rows and columns.

The expression “database management system” is understood here to coverany system of computer files making it possible to manage tables,irrespective of their mode of physical storage.

The invention includes within this device means which may be referred toas a development and assistance tool.

The assistance function is achieved through the fact that, in a meansforming an autonomous meta-dictionary, chosen information relating tothe structure of the database, typically the tables and therelationships between tables, or equivalent information, is storeddynamically.

Added thereto is an analysis function, which determines and stores (atleast temporarily) a representation of groups of columns whose contentsare related (related columns are interdependent columns, or, moregenerally, columns which are dependent on one another).

In practice, the analysis function can involve a statistical tool,suitable for determining interdependencies, and preferably alsodependencies, between data sets, by enumerating distinct occurrences,and an analysis (drive) module capable of cooperating with themeta-dictionary and with this statistical tool, so as to obtain andstore the said representation of groups of related columns.Advantageously, the statistical tool is based on a means of counting,which preferably operates directly on the columns of each table, in amanner which will be described hereinafter.

The analysis module can be devised so as to repeat the presentation ofsubgroups of at least two columns, until at least one subgroup is foundwhose columns are related, or until the possibilities are exhausted.Preferably, it systematically carries out the presentation of all thepossible different pairs of columns for the said table, preferably alsofor all the tables of the database.

On the basis of the analysis, it is possible to undertake arestructuring, so as to establish for the user, at least in create/inputmode, a presentation (or “view”) of the database which takes account ofat least one group of related columns.

The restructuring module, advantageously associated with a userinterface module, commences by selecting a starting table to beprocessed, columns to be processed from among at least one group ofrelated columns, and a primary relationship key for this group ofcolumns.

The restructuring can be performed by constructing a new table with thedata of a group of related columns, as well as with a relationship keywith the relevant table.

It can also be performed through the fact that in data input mode(form), modify access is restricted by default to a part only of thecolumns of a group of related columns, the others being simply readaccessible, or even inaccessible.

The analysis means and its storage means can operate in different ways,in particular: on request and/or on satisfaction of certain criteria(which may comprise a partial analysis), or else, conversely,continuously, dynamically (optionally in respect of partial analysisonly).

The invention can also be expressed in the form of a process applied toa computer, or else in the form of the novel industrial productconstituted by the development tool.

Other characteristics and advantages of the invention will becomeapparent on examining the detailed description hereinbelow, itsappendices, and the appended drawings, in which:

FIGS. 1 and 2 illustrate two exemplary architectures of informationprocessing systems usable according to the invention

FIG. 3 illustrates the combination of means within which the inventionis manifested,

FIG. 4 graphically illustrates a relationship between two tables,

FIGS. 5 and 5A illustrate the content of two tables, Components andProducts,

FIG. 6 illustrates a report relating to the two tables of FIGS. 5 and5A,

FIG. 7 illustrates a single table corresponding to the two tables ofFIGS. 5 and 5A, and FIG. 7A illustrates the same single table, but withanomalies of input,

FIG. 8 represents the appearance on the screen of a form for inputtinginformation, relating to the components and to the products, managed inone and the same table,

FIG. 9 represents the result of an interrogation query on the basis ofthe monotable structure of FIG. 7,

FIG. 10 is the detailed block diagram of the analysis means according tothe invention,

FIG. 11 illustrates the meaning of the designs of blocks used here inthe flow charts,

FIG. 12 (split into 12A and 12B) illustrates the operational flow chartof the query builder,

FIG. 13 illustrates the operational flow chart of the analysis module,and FIG. 13A illustrates an optional complementary operational flowchart of the analysis module,

FIG. 14 represents the result of a false dependency search query on thebasis of the monotable structure of FIG. 7A,

FIG. 15 shows the selection screen with a view to reorganization,

FIG. 16 (split into 16A and 16B) illustrates the operational flow chartof the selection and reorganization modules,

FIG. 17 represents the joins of the structure with three tablesComponents, Products and Categories,

FIG. 18 represents the form for inputting the components after thereorganizing of the information relating to the products, and FIG. 18Arepresents a simplified form for inputting the components after thereorganizing of the information relating to the products,

FIG. 19 represents the result of the interrogation query after thereorganizing of the information relating to the products, and FIG. 19Arepresents the result of the interrogation query after a newreorganizing of the information relating to the product category, and

FIG. 20 illustrates the operational flow chart of the virtualreorganization variant.

At the end of the description:

Appendix I recalls certain known concepts, which are useful forunderstanding the description,

Appendix II details in text form certain elements of the invention, and

Appendix III consists essentially of SQL commands involved in anexemplary implementation of the invention.

The drawings and appendices to the description are, in essence, ofdefinite character.

Accordingly, they will be able not only to serve in promoting theunderstanding of the description, but also to contribute to thedefinition of the invention, as appropriate.

The computer system of FIG. 1 comprises:

at least one computer 100, the so-called client, with a central unit 110(CPU, random access memory etc.), a screen 120, a keyboard 130, a massmemory 140 (hard disk for example), a pointing peripheral 150 (mouse forexample), a printer 160, and a peripheral 170 making it possible toaccess a network (local or remote) as well as the correspondingsoftware, and

a computer 200, the so-called server, comprising a central unit 210(CPU, random access memory etc.), a screen 220, a keyboard 230, a filestorage system 240 (hard disk for example), optionally a pointing system250, a peripheral 260 making it possible to access the network (local orremote) as well as the corresponding software.

The mass memory (or another memory) of the two computers accommodates anoperating system. For the “client”, an operating system with graphicsinterface, such as Windows, OS/2 or MOTIF for example (registeredtrademarks) will for example be taken. For the server, the graphicsinterface is less useful in certain cases, and it is possible to takeWindows NT, OS/2, Unix, or Novell for example (registered trademarks).The operating system is started up, in a known manner, in general whenthe computer is booted.

A database is installed on the storage system of the server. Each dataitem is represented materially by an ordered string of impressions(magnetic, optical, magneto-optical or the like) on the storage medium(disk or the like), which is accessible for example via a language ofthe SQL type, and with the aid of a database engine, installed on thestorage system. On the “client” computer(s) side, it is sufficient toprovide programs, recorded on the storage system, making it possible toaccess the database via an appropriate language, for example the SQLlanguage.

FIG. 2 illustrates a “monostation” configuration integrating some ofthese elements into a single computer. There is then just one storageperipheral (140+240), and the peripheral allowing access to the network(170) is unnecessary.

Although the invention may be applied to numerous database managementsystems, or be based on a simple file system, it will initially besupposed in the description that the computer system is furnished with arelational database engine. In this regard, mention is made for alluseful purposes of the following works, which are incorporated into thepresent ones by way of reference:

“Les bases de données relationnelles” [Relational databases], by AndréFLORY and Frédérique LAFOREST, published by ECONOMICA, 1996, especiallychapter 2 thereof, with regard to the fundamentals and the objectives ofthe relational model in respect of databases, and chapter 4 thereof inrespect of the theory of normalization of relational databases,

“Relational Databases and Knowledge Bases”, by Georges GARDARIN andPatrick VALDURIEZ, published by ADDISON WESLEY, 1989, especially chapter4 thereof, with regard to the fundamentals and the objectives of therelational model in respect of databases, and chapter 5 thereof inrespect of the theory of normalization of relational databases,

“Concevoir et développer avec Oracle et Case” [Designing and developingwith Oracle and Case], by Hatem SMINE published by Eyrolles, 1994, inparticular chapters 9 and 10 thereof, in respect of the techniques andtools for developing applications geared around a relational database,in the exemplary case of the Oracle database and its development tools.

“Using the Oracle toolset”, by Mike KROHN published by Addison-Wesley,1993, in particular chapters 6, 7 and 11 thereof, in respect of thetechniques and tools for developing applications geared around arelational database, in the exemplary case of the Oracle database andits development tools.

The information processing chain is illustrated in FIG. 3, with:

at 410, a processing unit, for example one at least of the units 110 and210 (FIGS. 1 and/or 2);

at 440, the mass memory, for example one at least of the hard disks 140and 240;

at 450, the operating system which makes it possible to access thephysical information of the mass memory, in the form of a file system460. It will be noted that the physical relationships are shown by solidlines, whilst the logical relationships are shown by dashed lines. Theremainder of FIG. 3 depends essentially on the file system, consequentlyon the mass memory 440 and on the processing unit 410, through theoperating system 450.

at 470, the database engine, which accesses the logical data 475.

at 471, an “internal dictionary” (this being the name generally given,in relational databases, to the information about the structure of thedatabase, which is stored in a manner accessible to the database engine,but inaccessible to direct modification by the user-read only).

at 500, the development tool (general part) which accesses the database475 (tables, forms, status reports, in particular).

As will be seen hereinafter, the development tool 500 works with agenerator of tables, forms and status reports 505, a meta-dictionary510, an analysis module 530, a selection module 570, as well as aphysical reorganization module 580 and/or a virtual reorganizationmodule 590.

The fundamental elements of a database are the “tables”. Each table is aset of rows; each row is organized as one or more columns; and eachcolumn corresponds to a data type. Each row comprises one and only onevalue (the “data item”) for each column of the table. Physically, thestorage of the data can correspond to this table-like structure, or elsebe different.

Each table generally comprises at least one column which contains uniquevalues only (a value is “unique” when it is found only once in thecolumn of the relevant table), or else an equivalent means, row numberfor example, which may or may not be arranged as a column of the table.This column (or one of them) is then termed the primary key. The valueof the primary key allows one-to-one retrieval of the corresponding row,hence the benefit thereof. All the columns which contain unique valuesonly may serve as primary key.

A “details” table has very frequently to reference the information ofanother table (“master”). For this purpose, in the details table theremay be provided a so-called “external key” column, which contains theprimary key of the master table. When jointly accessing the data of twotables, these two tables are related by a “join” (master/detailsrelation): when accessing the master table, the external key is used tosimultaneously bring up all the corresponding details contained in thedetails table; conversely, when accessing the details table, the primarykey of the master table can be used to simultaneously bring upcomplementary data contained in the master table. This concept of joinor relationship between tables is illustrated by the diagram of FIG. 4,with a table of Clients, and a table of Invoices, in the case ofinvoicing management.

The present detailed description will focus on another example relatingto a production line for computers (the products), which aremanufactured by assembling components. It will be seen that two tables“Products” and “Components” can be formulated, having for example thecontents illustrated in FIGS. 5 and 5A. Here, the primary key of the“Components” table is the C_SN column (the serial number of thecomponent); this is also the external key of the “Products” table. Viathe link between the two tables, it is possible to determine, forexample, that the components whose C_SN values are “11”, “13” and “14”form part of the product whose product serial number P_SN is “US1”.

The pair of Products/Components tables is of the master/details typesince:

to a product there corresponds from zero to n components (the zero casewould correspond to a product whose components have not yet been definedor are not yet present in the table of components), and

to a component there corresponds one and only one product.

Such a join between two tables implies certain integrity constraints asto the values of the primary key and of the corresponding external key:to each value of the external key there must correspond one and only onevalue of the primary key.

The collection of joins for interrelating two tables is called a path.Certain database structures are such that several paths can existbetween two of the tables which they contain. For example, two Clientsand Suppliers tables can be related by a path passing through an Orderstable or by a path passing through a Countries table. Conversely, theremay be no path between two tables. For example, two “Production” and“Production_History” tables although having the same structure andcontaining data of the same flavor have no interrelationship (in thesense of the structure of the data).

The use of joins or relationships makes it possible to reduceinformation redundancies. In an ideally designed database, a particularpertinent information item is stored once only (except for the primarykey/external key pairs required for the relationships). This offers asaving of space. Furthermore, updates are made easier. Thus (FIGS. 5 and5A), modifying the name of the product (P_NAME) having a serial numberP_SN equal to US1 involves modifying only a single row (in the Productstable) although it consists of three components.

The data of a table are accessed via a form displayed on the screen. Aform is composed of fields, each of which corresponds to a column of atable. It makes it possible to insert new rows into the table (create),to modify existing rows, to delete rows. A form which makes it possibleto access several tables must then comply with the integrity constraintson the relevant structure.

A report (see example in FIG. 6) presents chosen columns of a table, forrows selected via a query (sometimes for the whole table). The result isdisplayed on the screen and/or printed. A report can interrogate andpresent the data from several tables, but it must then know thecorresponding structure (external keys, primary keys and joins). Inaddition to reports, it is possible to define graphics displayed on thescreen and/or printed.

The reports and the graphics are commonly referred to as status reports.The forms and the status reports are commonly referred to as programs.And the programs access the data via queries.

The status reports are accompanied by interrogation queries. Dependingon the way in which the status report generating tools are designed, theinterrogation query may be considered to be included within the statusreport and/or separate from the status report. Also known are insertionqueries, update queries and deletion queries, which are in principlereserved for forms.

In this detailed description, the so-called SQL (Structured QueryLanguage) query language will be used. The SQL language has formed thesubject of several standardizations, the latest of which is the ANSI/ISOSQL-92 standard. The keywords of SQL commands are presumed to beunderstood by the reader. If necessary, they may be found in theaforementioned standard. Either quotation marks (“), or an apostrophe(') are used as delimiter for a character string. Appendix I of thedescription illustrates the other principal concepts needed tounderstand the present description, namely:

in A-11, calculated columns,

in A-12, restrictive conditions,

in A-13, the general principle of a query (SQL's SELECT command).

The SQL language makes it possible to access relational databases whilstcomplying with an independence between the way in which the data arephysically stored and the way in which these data are manipulatedlogically. Through its power, the SQL language makes it possible toexpress complex processing operations simply, including in particularnested loops (via joins): see Examples A-13-1 and A-13-2. However,recourse to the SQL language is in no way limiting, and the presentinvention can equally well be applied with the aid of a lower levellanguage, which would drive the file system of the storage peripheraldirectly, for example.

In general, an application groups together a collection of programsgeared around a given structure (tables and integrity constraints). Amenu makes it possible to select, within an application, the programwith which one wishes to work.

The development of applications geared around database engines has beenautomated to a greater or lesser extent by development tools. Adevelopment tool is an item of software which offers the programmer alibrary of program generators. These program generators allow anappreciable reduction in development times: when using a programgenerator, the programmer inputs parameters rather than writing thesource code; these parameters are thereafter interpreted so as toproduce a program in source code which the programmer can modify oraugment manually, insofar as he masters the source code.

In principle, the development of applications geared around databaseengines adopts the following methodological approach: the programmerperforms an analysis of the requirements. To do this, he can use adesign method (Merise for example) whose ordered set of operationalrules guides him in systematizing the thought process and in avoidingdesign errors. The analysis makes it possible to define a “data model”which expresses the semantics of the data in a pertinent manner. (A datamodel can be represented graphically by an Entity/Relation diagram.).The requirements analysis phase demands that the programmer has a goodknowledge of relational algebra and of normalization of the structure ofa database.

With an elementary development tool, the physical structure of the datamust be implanted in the database using the “data model”. After this,the development tool makes it possible to define forms and statusreports geared around the structure (based on the internal dictionary)as well as menus. The user can then augment these forms, status reportsand menus.

With a (more complex) advanced development tool, the structure of thedata is input and stored in another dictionary, specific to thedevelopment tool. The data which have been input at this juncture, andthose of the corresponding processing operations are referred to as“meta-data”. This other dictionary, or “meta-dictionary” (510, FIG. 3),is distinct from the internal dictionary. It is geared around a systemof independent files or around tables of the database itself (modifyaccessible, only via the development tool).

The meta-dictionary forms the subject of possible consistency checks bythe development tool and may even build in certain operational rules ofan analysis method (Tools of the CASE type). Using this meta-dictionary,the generator 505 (FIG. 3) of the development tool takes on board thefollowing tasks:

the physical structure of the database is created, with the aid of themeta-dictionary (the internal dictionary is updated accordingly);

the programs and the menus (if any) are generated by the developmenttool;

the reports generator composes several interrogation queries (generallyone per set of tables), and constructs a standard report for each query.

The user can then augment them.

Thus, the complexity of accessing the database is masked to the user toa greater or lesser extent, as a function of the quality (complexitylevel) of the development tool.

All this makes it possible to operate a database. Once operations havebegun, it may be necessary to modify the data model and hence the datastructure, and to do so for various reasons, for example:

the applicational requirement may have changed (change of trade forexample),

the initial requirement was ill-defined (inadequate specifications), ordealt with by a novice.

An ad hoc modification of the structure of the data (addition of acolumn to a table for example) is more or less well accepted by thedevelopment tools:

in the case of a “crude” tool, the user must modify the structure of thedata manually, together with the queries in the relevant forms andstatus reports.

On the other hand, with a more complex, high-performance tool, it issufficient to update the meta-dictionary of the development tool, thisbeing automatically passed onto the physical structure of the data andonto the queries of the forms and status reports.

A more significant modification of the structure of the data (additionof tables, splitting of one table into several tables for example) isvery unwieldy, if one desires to preserve the data already input. Andthe user will come up against several difficulties:

a) the new structure, even if it complies with the relational integrity,may perhaps no longer tally with the requirements of the application.The data may become unutilizable.

b) the syntax of the languages (procedural or otherwise) for accessingthe data is complex and many modifications which are not taken intoaccount by the development tool need to be made manually.

c) the modifications to be made to the queries of the forms and statusreports are complex.

d) modification of the structure is itself tricky and complex if onedesires to preserve the existing data (a very simple example: having anarticle number column, one wishes to alter the latter from numericaltype to character type).

The generation by the development tool of a new version (based on thenew structure) yields the starting versions of the forms, status reportsand menus. Hence, all of the augmentation subsequently made to them islost. In all cases, it is not easy, under the action of a developmenttool, to physically modify the data structure, and to modify theapplication so as to satisfy a requirement which was not envisagedoriginally.

The foregoing shows the importance of the problem posed. The invention,which proposes to solve this technical problem in an elegant andreliable manner, will now be described.

The invention makes it possible to start in all cases from a monotablestructure. It thereafter allows the user himself to modify the structureof the data, once production has begun, as a function of his newrequirements. Indeed, it does not necessitate any particular knowledgeof the data access technique (SQL language for example), or of theprinciples governing relational data structures. It thus allows savingsin analysis time, development time and subsequent modification time. Itis suitable for all relational databases on the market, and also forother data management systems.

The invention provides an improved development tool or means which canalso be regarded as a user environment interface.

This tool is based on a particular meta-dictionary 510 (FIG. 10), theminimum content of which (for the preferred embodiment) is given inA-21.

The meta-dictionary repeats certain information already contained in theinternal dictionary, such as the name and the type of each column ofeach table. It contains much more of it. This meta-dictionary 510 istended by a storage handler, which monitors every event relating to theconditions given in A-21. This monitoring can be limited to the momentsat which the development tool is operating (insofar as it has exclusiverights with regard to the modifications of the said conditions). It canbe wider. It is preferably continuous.

The invention is also based on a query builder 520 (FIG. 10). Thequeries built obey a well-defined general format, here dubbed thecanonical form, the currently preferred version of which is indicated inA-22, whilst examples of canonical queries are given in A-23.

The manner in which the query builder works, in conjunction with themeta-dictionary, is illustrated in FIG. 12. In FIG. 11 will be found themeaning of the designs of blocks used in the flow charts.

After the begin 900, step 910 presents the user with various (available)possibilities for selecting columns, simple and calculated, as well asset functions. The expression set function is understood to mean afunction pertaining to several rows, such as SUM( ), COUNT( ), AVERAGE(), for example. At 911, the user can construct expressions using thecolumns and/or set functions. Thereafter, he can:

at 912, create and link restrictive conditions,

at 913, define one or more (ordered) sorting criteria.

These steps 910 to 913 constitute the initialization of the querybuilder. They are described as carried out by the user. However, thequery builder may equally well operate automatically, receiving thisinitialization on startup, in the form of parameters, which respectivelydefine the elements input at 911, 912, 913. It is important to note thatthis query designer can store, in the form of character strings, thedefinition of the queries, the writing of which it has made possible.

Step 920 determines the set of tables to which the columns andexpressions defined at 911 to 913 belong. This is performed selectivelyfor each of the expressions involved in the canonical query A-22. If acolumn belongs to two tables, it is the primary key of one and anexternal key of the other. The query generator can associate this columnwith either of the two tables (the result of the query will be thesame). The table where the column is primary key will preferably betaken.

Steps 921 to 925 and 929 determine the path connecting these tables (ifthere are several of them). Optionally, step 923 adds one or more tablesso that such a path exists, starting from the meta-dictionary. It willbe noted however that, whenever the implementation of the inventionstarts from a unique table, all the tables created thereafter arenecessarily connected by one and only one path.

Hence, the first two rows of the query A-22 (and the third optional row,if the succeeding rows call columns whose tables do not appear in thesecond row) are constructed at 930 and 931.

In step 940, the “following” of the path (if there is one) and thesatisfying of the restrictive conditions are dealt with in a similarmanner. This serves in deciding (941) to write the WHERE clause, with alist of joins (943) and/or a list of restrictive conditions (949).

Thereafter, the GROUP BY clause is written at 961, if the test 960 soindicates: existence of set functions in the expressions and/orrestrictive conditions.

The ORDER BY clause is written at 971, if the test 970 so indicates (thesorting criteria may comprise set functions).

The HAVING clause is written at 981, if the test 980 so indicates:existence of set functions in the restrictive conditions.

It will be observed that this query builder is itself responsible forall the checks. It suffices to give it, by input, or else by parameterpassing, the selective list of expressions to be processed.

Using the definition of the form, as stored in the meta-dictionary, theinvention also provides for the automatic (dynamic) creation of one ormore forms.

This form is therefore suited to the structure of the data. It allowsthe consultation, insertion, modification and deletion of rows in therelevant table or tables. It complies with the integrity of the data byimposing the following constraints:

the value of a column must always tally with its type (for example,whatever is input into a date-type column must be a data item recognizedas a date).

a primary key column must include only unique values.

an external key column must have a set of values included within the setof values of the corresponding primary key.

Again using the meta-dictionary, the invention makes provision for theautomatic generation of reports and graphics, by the module 505.

A report is compiled on the basis of an interrogation query. It ispossible in a report to define:

formats for presentation or “augmentation” (such as: bold, italics,large characters),

group calculations performed on the data of the table (such as: totals,enumeration),

breaks which condition the rezeroing of the group calculations. Thesebreaks are organized as levels. The highest level termed level zero isthat of the entire report. Level 1 corresponds to the first break (themost general) and the last level corresponds to the information of thebreak of last rank.

restrictive conditions or “filters”, such as: product name(P_NAME)=“COMP1”, cost (COST)>1500.

certain sorting criteria.

According to one aspect of the invention, the breaks are proposed as afunction of the structure of the application: consider a column COL₂which belongs to a set of tables {TAB2 ₁ . . . TAB2 _(p)}, and thepreceding column COL₁, which belongs to a set of tables {TAB1 ₁, . . .TAB1 _(n)}; these two columns are presented in the same break level, if,for every value of i, from 1 to n, and for every value of j, from 1 top, the path for going from TAB1 _(i) to TAB2 _(j) does not comprise anyjoin in the master/details sense. These breaks correspond to thebreaking down of the structure into levels and therefore offer the usera default presentation which is expressive. The user can however modifythem at will and define certain group calculations only for certainbreak levels and also insert or delete breaks.

FIG. 6 illustrates a report presenting the product name, the serialnumber of the component, the cost with:

as restrictive condition: the product name (P_NAME) must be differentfrom “COMP1”,

a break with regard to the product name,

a cost total (COST) for the components,

a count of the number of components.

A graphic corresponds to an inquiry and to a graphical presentation ofvalues of columns for certain rows. In general, a graphic is determinedby parameters such as:

the type of presentation (histogram, pie chart etc.)

the format of presentation (color etc.)

one or more sorting criteria

calculations (sum etc.)

According to another aspect of the invention, a graphic is compiled froman interrogation query comprising from zero to several columns of labeltype and one or more columns of numerical type. It comprises:

The format of the label type column.

The format of the graphic. Certain graphic formats accept only aspecific number of columns of label type and of columns of numericaltype. For example:

The pie chart supports only one column of label type and one column ofnumerical type.

The XY graphic (commonly referred to as a curve) does not supportcolumns of label type and supports only two columns of numerical type(optionally three-dimensional graphics).

If in the interrogation query there are more columns of label type thansupported by the format of the graphic, the module 505 generates as manyexploded graphics as necessary, using a mechanism similar to that ofbreaks.

Example: consider an interrogation query which returns the turnover(equal to the sum of the amounts of the orders) by product and by region(the path between the Products and Regions tables comprises a join inthe master/details sense). If the format of the graphic chosen is a piechart, then a product/turnover pie chart will be generated for eachregion, and a product/turnover pie chart for all the regions together.

With the aid of the meta-dictionary, the invention makes it possible toensure that the relation between the query and the status report (reportor graphic) remains consistent after any reorganization.

The example of FIGS. 5 and 5A gives an already-tailored version of thestructure of tables. In reality, if he seeks to develop a database forthis same example, a novice will generally put everything into a singletable. With the invention, the user can always commence with a singletable, which hereinafter will be dubbed “Components_and_Products” (FIG.7). To do this, he simply defines the name and the type of the datawhich he wishes to input. The development tool has updated themeta-dictionary, created a monotable data structure in accordance withthe name and the type of the columns indicated by the user, and createda default form allowing access to the table, and status reports.

Thus, the user can get going almost instantaneously, without having tothink about the interactional structure between his data: he will,thereafter, input data via the form, and print status reports. FIG. 8represents the form for inputting the information into the“Components_and_Products” table. FIG. 9 represents the result of aninterrogation query based on the monotable structure (with nofiltering).

Even though it makes it possible to commence with a single table, theinvention can also be applied to the case where an experienced developerwill make initial choices for the structure of the tables which areguided by formal analysis thereof and practical considerations, as soonas these choices need to be reconsidered during the subsequentoperations (with more and more data).

Indeed, it is noted that the data of a table having as few rows as thatof FIG. 7 are already redundant. The informed observer will noticeimmediately dependencies between columns: CATEG depends on C_SN, P_SN,P_NAME.

However, at the data level, CATEG also seems to depend on C_ID and onC_DESC, in accordance with the aforesaid criteria; this is because:

the components SCR17, P166, HD1.2 and SCR15 are involved only in themanufacturing of products from the USER category, and

the components SCR14, P200 and HD2 are involved only in themanufacturing of products from the SERVER category.

This dependency (which, because illogical, is false) is due to the factthat the data sample is too narrow to be representative. If a productfrom the USER category is for example manufactured with a hard diskreferenced HD2, the dependency will disappear.

In general, the current structure of the database is no longer suitablewhen certain data are redundant within the table. Stated otherwise, theuser is confronted with anomalies since the data model does not complywith or no longer complies with the various normal forms of therelational model.

In the case of FIG. 7, the user could be confronted with the followinganomalies:

Deletion: the deleting of the single component of a product also deletesthe information relating to the product.

Update: the updating of the information of a product (its name forexample) must be carried out on all the rows of the table correspondingto its components. There are therefore as many modifications to be madeas there are constituent components of the products.

Insertion: a product cannot be created without immediately assigning itat least one component.

Faced with the need to reorganize the structure of the database, theuser would normally be compelled to operate by hand. For this purpose,the present invention provides him with means which may be broken downinto three parts (FIG. 3):

i) the analysis of the data (530),

ii) the selection by the user (570) of the columns of a table which willbe isolated; during this phase, the user is superintended by consistencychecks.

iii) the execution (580) of the reorganization of the data structure.

Phases ii) and especially iii) are optional. Indeed, it will be seenthat it is also possible, as a variant, or as an adjunct, to implementthe invention without physically reorganizing the database (590).

The aspect of the invention which is presently regarded as essential isthe analysis. Here it is based on the meta-dictionary given in A-21(510, FIG. 10), or on an equivalent means.

It is also based on a statistical tool (520, FIG. 10), which, in oneembodiment, consists of a set of SQL queries given in Appendix III, andsuited to various operations for systematic enumeration, as a functionof the current structure of the database, such as it features in themeta-dictionary. These enumerations pertain for the most part only to“distinct values”. The SQL queries interrogate the relevant table ortables directly.

The analysis phase comprises all or some of the operations which willnow be described, with reference to Appendix III. It is driven by amodule 530 (FIG. 10).

In general, the “current” columns denoted C_(i) and C_(j) in thedescription are denoted COL1 and COL2 in Appendix III, in accordancewith the customary practice of persons skilled in the art.

The process firstly comprises an analysis (preferably exhaustive) of therepetitions of the values of the columns of the data tables. Thisanalysis will be described with reference to FIG. 13 and to AppendixIII.

Starting from the begin (1000), the tables of the database aredetermined, or better those used in the application (1010).

For each table, the analysis consists in determining:

at 1012, the number of rows, N.

In the SQL language, the number of rows of a table (step 1012) isobtained through query A-31-1, where TAB1 is the name of the relevanttable.

at 1020, the names of the columns of the table,

This is obtained by reading the meta-dictionary (as a variant, theinternal dictionary).

For each column C_(i), the number of distinct values N_(i), at1024-1026.

The number of distinct values of a column of a table (step 1024) isobtained through the SQL query A-31-2, where COL1 is the name of thecolumn.

then determining and running through the pairs of two different columnswhich can be formed for the table (1030)

This is obtained by reading the meta-dictionary (as a variant, theinternal dictionary), in the form of a simple loop to be designed.

For each pair of columns (C_(i), C_(j)), doing the operations (detailedbelow) making it possible to calculate:

their interdependency ratio.

the number of pairs of values which block the dependency.

For each group of columns of a table (C_(i), C_(j) . . . C_(n)), anycandidate keys (join or relationship) for the group.

We now return to the calculation of the dependency ratio of two columnsof one and the same table. Let Ni and Nj be the respective number ofdistinct values of the two columns C_(i) and C_(j) (calculated at1024-1026). At 1034, we obtain P, the number of distinct pairs of valuesof C_(i) and Cj, through the query A-31-3, where COL1 and COL2 are thenames of the columns C_(i) and C_(j), whilst:

“xyz” is a character string which does not exist in column COL1. It isbest to take any character string which is a priori unlikely. Theabsence of the string “xyz” in COL1 can be verified through the zeroresponse to the query A-31-4.

is the symbol for concatenating the values of two columns, as supportedby the database engine.

Depending on the value P in relation to N_(i) and N_(j), the possiblesituations are:

case 1042, with P>Max(N_(i), N_(j)) there is no dependency between thetwo columns (1043). C_(i) and C_(j) are said to be independent. Thenumber of pairs of values which block the dependency (case of neardependency) may then possibly be calculated.

cases 1046+1047, with P=N_(i) and P>N_(j) then C_(i) determines C_(j).Column C_(i) is said to be the mother of C_(j). Column C_(j) is said tobe the daughter of C_(i).

cases 1046+1049, with P=N_(j) and P>N_(i), then C_(j) determines C_(i).Column C_(j) is said to be mother of C_(i). Column C_(i) is said to bethe daughter of C_(j).

cases 1044+1045, with P=N_(i) and P=N_(j) the C_(i) and C_(j) are saidto be equivalent from the point of view of dependencies, orinterdependent. C_(i) and C_(j) are said to be sisters.

the case P<Min (N_(i), N_(j)) is impossible.

Steps 1050 and 1052 respectively ensure looping around all the pairs ofcolumns and all the tables until the End 1059.

The queries mentioned may be formulated with the aid of the SQL querybuilder already cited (A-22, and 520, FIG. 10). However, it will oftenbe simpler to prepare them individually in advance, in the form ofcharacter strings, certain elements of which are variable. In thequeries A-31, the variable elements are TAB1, COL1, “xyz” (or, what isequivalent ‘xyz’), and COL2. In conjunction with the meta-dictionary,step 1024 (for example) of FIG. 13, will recompose the query A-31-2, byreplacing the names COL1 and TAB1 therein by the name of the column andthat of the table, respectively.

The writing of programs capable of implementing the loops described inFIG. 13 is regarded as accessible to the person skilled in the art onthe basis of the present description, and of a procedural language suchas the C language and its variants (C++ for example), or else SMALLTALK.

The case of “near dependence” is now examined with reference to FIG.13A. This involves a pair of columns COL1, COL2 from one and the sametable, for which each value of COL1 makes it possible to determine in aone-to-one manner the value of COL2 “to within a few exceptions”. Anexample is given in A-24.

Generally, the number of pairs of values, B, which block themother/daughter dependency between two columns C_(i) and C_(j) isobtained (step 1210, FIG. 13A) through the query A-32-1, where a and bare aliases defined in the table TAB1, and the other variables aredefined as before. (a table alias makes it possible to access the tableunder a name other than its own; several aliases with regard to the sametable make it possible to access the latter in two independent ways).

If B is small, it will be beneficial to force the dependency; the userwill then have to normalize the two columns. Columns C_(i) and C_(j) aresaid to be the pseudo-mother and pseudo-daughter respectively. One alsospeaks of “pseudo-sisters” in the particular case where each of the twocolumns plays the role of pseudo-mother and of pseudo-daughter withrespect to the other. This normalization of two columns in order toforce the dependency will now be described.

It is firstly necessary to identify the pairs of values, which block themother/daughter dependency between two columns C_(i) and C_(j). This isdone in step 1230 (FIG. 13A), to which the query A-32-2 corresponds.This makes it possible to assess, for a value of the pseudo-mothercolumn, the frequencies of the various values of the correspondingpseudo-daughter column.

The query A-32-2 comprises a nested SELECT command, which counts thenumber of occurrences (greater than 1) of the pair of values within thetable. A search is made firstly for the values of COL1 which block thedependency. This involves the values of COL1 such that the number ofdistinct values of COL1‘xyz’COL2 is greater than 1.

The query A-32-2 thereupon builds a main SELECT command which returnsthe value pairs found, given in FIG. 14, for the relevant example. Foreach element of the set of values of COL1 which block the dependency(according to the nested query), a search is now made for thecorresponding values of COL2 and for their frequency. Likewise for theNULL value of COL1 (NULL is the case where nothing has been input).

It will be noted that it is possible to go directly to step 1230,without doing the preliminary count of step 1210.

The actual forcing (step 1250, FIG. 13A) is performed by the SQL commandgiven in A-32-3, where:

TAB1 is the name of the table,

COL1 and COL2 are two columns of the table for which one seeks to forcethe dependency in the sense where COL1 will determine COL2,

VALUE2 is the new value allocated to COL2,

VALUE1 is the value of COL1 for the rows to be modified,

VALUE2_1, VALUES2_2, . . . VALUES2_n are the values of COL2 for the rowsto be modified.

However, in the case of values not input into COL1, the forcing isperformed by the SQL command given in A-32-4.

The forcing can also comprise the SQL command given in A-32-5, where:

TAB1 is the name of the table,

COL1 and COL2 are two columns of the table for which one seeks to forcethe dependency in the sense where COL1 will determine COL2,

VALUE1B is the new value allocated to COL1,

VALUE1 is the value of COL1 for the rows to be modified,

VALUE2_1, VALUES2_2, . . . VALUES2_n are the values of COL2 for the rowsto be modified.

We now return to the monotable structure, but according to FIG. 7A (withslightly different content, for illustration). The dependency relationbetween the product number and the product name may be severed (error ofinput for example). The result of the query making it possible todetermine the pairs which block the dependency could be (FIG. 14),expressed plainly:

“There are two components for the COMP1-USER COMPUTER product and onecomponent for the COMP1-USER COMPUTER BIS product; one and the sameproduct number has been used for two different products.”

To force the dependency, it is necessary by choice to:

Allocate another product number to one of the two (A-32-5). In this casethe analysis of the frequencies indicates that it is more probably theUSER COMPUTER BIS row which does not have the right product number.

Replace “USER COMPUTER BIS” by “USER COMPUTER” (A-32-3).

In addition, a product number must be input for the product with serialnumber SRV1 (A-32-4).

Any candidate keys for a group of columns must comply with theconstraint: Each candidate key must be, with regard to all the othercolumns of the group, either mother, or sister. In order for a column tobecome candidate key of a group of columns, it is necessary to force thedependency with all the pseudo-daughters of the group (in the case ofnear dependency).

Preferably, the result of the analysis is expressed and stored in theform of a group of columns, comprising:

the subgroup of the candidate keys, and

the subgroups of their daughter-columns, ranked by level of kinship(daughters of daughters, and so on and so forth), in a tree-like manner.

The consistency checks are made in accordance with the principlesgoverning relational structures; they depend above all on the result ofthe analysis of the data already input, kinship levels in particular.They guarantee the user against unutilizable data and spare him from theneed for any knowledge and understanding of access to the data, and ofthe principles governing data structures (relational for example).

The result of the analysis makes it possible to:

suggest which columns will be able to be isolated from a table,

determine any candidate keys of a group of columns,

make it possible to benefit from an assisted mode of input using thedependencies (possibly a virtual one).

One of the implementations of the invention therefore consists inphysically reorganizing the structure of the tables. Elementarily, thisfunction of reorganizing the structure consists in splitting a table upinto two tables, which will be in a master/details relation.

Although it is possible to proceed otherwise, it is nowadays deemedpreferable to leave the user to choose from among the variouspossibilities which may emanate from the analysis. Such a choice willpertain on each occasion to a table (TAB1), and to a group of columnsS1, S2 . . . Sn of this table (which will form the new table, whereasR1, R2 . . . Rp denote the columns which will remain in the table TAB1),while complying with the following constraints:

the selected group of columns S1, S2 . . . Sn must not include theprimary key of the table (the starting one),

the dependencies between the selected columns make it possible todetermine the candidate keys; the user may possibly select one of them.

This phase is performed for example with the aid of a user interfacehaving windows, like the one of FIG. 15, with a window 91 for selectingtables (here one only), then a window 95 for selecting the columns to beisolated from the chosen table, then selection at 96 of a candidate keyfor the new table (as an option). The name of this new table, which willbe denoted TAB2 hereinafter, is input at 92. Theselection/reorganization phases are illustrated in the form of steps inFIG. 16, in the case of an interaction with the user, with the screen ofFIG. 15.

Starting from the begin 2000, step 2010 uses the meta-dictionary toascertain the list of tables used by the application, which list isdisplayed at 2012. At 2014 the user chooses the table TAB1 to bereorganized. Using the meta-dictionary, at 2020 the system delivers thelist of columns of TAB1. At 2022, these columns are displayed, with theexception of the primary key of TAB1, and at 2030 the user chooses thecolumns Sj to be isolated.

These choices are made via a loop around steps 2032 to 2036. With eachselection of a column, one determines whether this column can serve ascandidate key (2034), by storing this information item.

Thereafter, we go to the actual reorganization, which is slightlydifferent, depending on whether the user has, or has not, chosen acandidate key. It is supposed that the forms defined at the outset droveall the fields of the single table from which one started.

i) If, in steps 2042 to 2046, the user has selected a candidate key,which is assumed to be (S1), the actual reorganization is performed bysteps 2048, 2049 and 2060, in accordance with the SQL commands A-33;

The master table is created by executing the command A-33-1. This SQLcommand creates a table named TAB2, with the result of a SELECT command,nested within the CREATE command.

The original table TAB1 is modified by stringing together the 3 commandsgiven in A-33-2. A temporary table TEMPO is created with thenon-transferred columns, plus the chosen primary key S1 (TEMPO is a nameavailable at database level: there is no other table or view having thisname). Thereafter, TAB1 is erased and TEMPO is renamed as TAB1.

ii) otherwise, in the second case, where the user has not selected acandidate key (and there is perhaps no candidate key for the group ofselected columns), the actual reorganization is performed by steps 2056,2058 and 2060, in accordance with commands A-34.

It is then necessary to introduce or create a column which will be theprimary key in the master table.

The creation of the details table follows the stringing together ofsteps A-34-1, where CODE_TAB2 is the name of the primary key of tableTAB2. CODE_TAB2 and TEMPO are names which must be available at databaselevel. The first row is very similar to the query A-33-1, except thatinstead of creating the table TAB2 directly, a TEMPO view is involved.It is the second row which creates TAB2, by adding to the TEMPO view acalculated column CODE_TAB2, the value of which is ROWNUM, that is tosay the rank of the row in the table, after which TEMPO is erased.

The tailoring of the original table TAB1 is performed as indicated inA-34-2.

Since for the moment the primary key exists only in TAB2, it isnecessary to start from a join between TAB1 (old) and TAB2, which hasjust been created. Hence, the more elaborate first CREATE command.

By stringing together the 3 commands given in A-34-2, a temporary tableTEMPO is created with the non-transferred columns, plus the chosenprimary key S1 (TEMPO is a name available at database level: there is noother table or view having this name). Thereafter, TAB1 is erased andTEMPO is renamed as TAB1.

In both cases, the meta-dictionary is updated (step 2065):

a) the new master table TAB2 formed of the columns S1, S2, . . . Sn istaken into account.

b) in the case of the creation of a primary key for TAB2, the creationof the column CODE_TAB2 in TAB1 is noted; the creation of the columnCODE_TAB2 in TAB2 is noted. The forms which make it possible to drivethe table TAB1 are augmented by the column CODE_TAB2, and no longer makeit possible to drive (input) the columns S1, S2, . . . Sn.

c) in the case where the column S_(i) has been chosen as primary key ofthe table TAB2, the forms which make it possible to drive the table TAB1no longer make it possible to drive the columns S1, S2, . . . Sn, exceptS_(i) which remains driven (but which must comply with the integrityconstraints due to its new role.

d) the primary key of TAB2 is referenced as such.

e) the external key of TAB1 is referenced as such.

f) the information from the master/details relationship between thetables is inserted. Furthermore, it may be necessary to update thepre-existing joins relating to columns of the newly created table.

The process which has just been described in an elementary manner makesit possible to guarantee that the data meta-dictionary remainsconsistent and synchronized with the physical structure of the data;thus, dynamic generation of the forms, interrogation queries and statusreports is ensured without user intervention, and these areautomatically adapted to the new structure, as will be seen hereinafter.

The user can then continue his work on the basis of the new structure.

Beside this, it is known practice to endow a field of a form withproperties such as: “input”, “modification permitted”, “modificationprohibited”, or else “modifiable only if”, with a condition which can inparticular be a restriction relating to certain constraints, or even theaction of the user on a defined keyboard button, or on an icon definedwith the mouse. In a development tool, this can be done at the level ofthe form generator part, working for example with a form definitionfile.

Starting from the moment at which, according to the invention, thesingle table has been partitioned so as to define a separate mastertable, the form which drove this single table will be modified, orreplaced by a new form, whilst complying with the following rule: assoon as a form comprises fields belonging to a master table and fieldsbelonging to a details table, only the fields belonging to the detailstable can be modified; and the values taken by the external key columnmust belong to the set of values taken by the primary key of the mastertable. A separate form is generated to drive the master table.

Thus, FIG. 18 represents the form for inputting the components used inproduction after the reorganizing of the information relating to theproducts. It is noted that the fields P_NAME, P_ID and CATEG are nolonger directly accessible since they do not belong to the Componentstable whereas they did belong to the Components_and_Products table; theyare determined by the value of the product serial number field (P_SN)which is the external key of the Components table which is related tothe primary key of the Products table.

To obtain this form from the previous one, all the fields (columns)which have been transferred to the master table are altered to“non-modifiable”, except for the external key. Specifically, a handlercan be used which will:

fetch the list of columns of the group which served in defining themaster table (gleaned from the memory 550, or from the meta-dictionary510, in the update which stems therefrom);

pull out the relationship key from this list,

search for each column name for this list in the definition of the form,so as to accompany the relevant field with the “non-modifiable”property, and

accompany the field relating to the relationship key with a “modifiableif and only if the new value exists in the master table” property. Thelist of existing values is given by the search query A-37 for searchingin the master table TAB2, of which COL1 is the primary key (the DISTINCTkeyword is optional).

As a variant, or as an adjunct, FIG. 18A represents a simplified formfor inputting the components used. The fields P_ID and CATEG which werefilled in automatically in the form of FIG. 18 have been deleted fromthe form. Only the Product name field (P_NAME) has been preserved so asto check whether the product serial number input is indeed the onedesired. Specifically, the process is the same as above, except thatinstead of accompanying the fields with the “non-modifiable” property,they are quite simply deleted from the definition of the relevant form.

At A-23 will be found a few examples of queries written for thestructure reorganized into two tables (Components and Products).

FIG. 19 represents the result of the interrogation query after thereorganizing of the information relating to the Products. A break (thisbeing a way of illustrating the new structure) has been providedautomatically by the report generator, in response to the presence of anew table. Just as with the forms, such a break can be simply determinedby the “Break” property associated with the relevant column, in thedefinition of the relevant report. Illustrations other than a break canbe envisaged.

In practice, the selection/reorganization cycle (or else theanalysis/selection/reorganization cycle) can be repeated as many timesas necessary, at the very east so long as there is still a group ofrelated columns to be processed.

In the example considered, it is thereafter possible to proceed with anew reorganization of the information relating to the category. FIG. 17represents the join relations of the structure with the three tablesComponents, Products and Categories. FIG. 19A represents the result ofthe interrogation query on these three tables. A second break has beenperformed automatically.

In the embodiment described above, the physical reorganizing of thetables has the following effect:

in data input mode (form), input access (modify, create, delete) isrestricted to some only of the columns of a group of related columns,the others being simply read accessible, or inaccessible;

to write access these other columns, a different form must be used,processing the new table created.

This other form can be rendered accessible from the first, in particularunder the following conditions:

should an external key which does not exist in the master table beinput,

when the user wishes to modify the record of the master table whichcorresponds to the value of the external key, actuating for example akeyboard function button.

However, it would be possible to envisage, after physically reorganizingthe tables, different solutions for tailoring the forms, in particularif it is supposed (doing away with the aforesaid rule) that, in a formwhich comprises fields belonging to a master table and fields belongingto a details table, it is possible to modify fields belonging to thedetails table and fields belonging to the master table.

According to an interesting variant, instead of physically restructuringthe tables, the analysis performed may serve only to modify the mannerof operation of the input forms, behaving as if the database structurehad been reorganized (“virtual reorganization” of the database). Sincethe group or groups of related columns are not reflected by amodification of the structure of the tables in the meta-dictionary, theforms are then tailored on the basis of the memory of dependencies (andof keys) 550.

The tailoring takes account of the levels of kinship between the relatedcolumns of each group. The starting point is a relationship key which isone of the candidate keys and is selected by the user or chosenautomatically.

In a development tool (form generating part, working for example with aform definition file), it is also known practice to endow a field of aform with a “value if empty” property, accompanied by an expression, orby a function, the result of which will be taken as the value of thefield if it is empty (a little like a “default value”). Likewise, the“the value has been modified by the user” property is also known.

A request which searches, in the same table, for at least onepre-existing row having the same value for the relationship key willthen be taken as expression. This query may be based on the querybuilder, or else be rebuilt each time, on a base prepared in advancepiecewise, and stored.

In this virtual reorganization mode, a form can work as follows:

During the inputting into the field of the relationship key, the emptyfields dependent on this key (sisters and daughters) are automaticallyfilled in by default.

Thus, in the form of FIG. 8 (monotable structure), the inputting of“US1” as product serial number (P_SN) causes the automatic inputting ofthe Product reference field (P_ID) which is a sister column and of theProduct name field (P_NAME) and category field (CATEG) which aredaughter columns.

The field P_ID (COL2) is filled in automatically by the result of thequery A-36 searching for “US1” in the column P_SN (COL1) of the alreadyexisting rows. The DISTINCT keyword is optional in the query A-36, assoon as any row found contains the sought-after result for COL2 (P_ID).

The user can at leisure overwrite the values filled in by default. Hecan thus sever the dependency between two columns and the operation ofthe virtual reorganization mode is then interrupted for this pair ofcolumns.

For example in the monotable structure, there is a dependency betweenthe Cost column (COST) and category column (CATEG): the inputting of acost automatically fills in the category field. If the user changes the(automatic) value of the Category field, the dependency between Cost andCategory is severed: the inputting of the Cost field will no longerautomatically fill in the Category field. The memory of dependencies 550is updated accordingly, and the operation of the form is modified.

This is as it were a dynamic coupling between the analysis and the inputforms. Whereas the structure of the tables is not modified, the datarelating to the operation of the input form or forms remain, on theother hand, physically modified in a manner which simulates therestructuring of the tables.

This can be performed systematically, according to the steps of FIG. 20.On validating 2200 a field, and if the latter has been input manually(2210), the value input is placed in a memory area of the list type, orcontainer, in step 2220. So long as this container contains at least oneelement (2230), its first element (which disappears from the container)is extracted at 2240 so as at 2250 to determine its dependent columns(direct daughters and sisters of the field considered at 2200). Thefields relating to the sister columns are filled in automatically, ifthey are empty, at 2260. In step 2270, the same is done for the directdaughter columns, which will moreover be placed in the container. (A isthe direct daughter of B if there exists no column C, other than A andB, such that B is the mother of C and C is the mother of A). And we thenreturn to 2230. The process terminates at 2290 when the container isempty. This makes it possible to process the sisters of the fieldtargeted at 2200, as well as to run through the complete tree of itsdaughters, granddaughters (and their sisters), and so on and so forth.

In data input mode (form), it will be possible to compel (by default orfirmly) the data present in related columns of a row to remainconsistent with at least partially identical data already existing inother rows, with the aid of a query of the same type as above. Thedefault compulsion is aimed at simplifying input by suggesting valuesdependent on values already input. Firm compulsion corresponds to thecase where one wishes to lock the dependency or the interdependencybetween the columns of a group.

The analysis can be rendered dynamic, with updating in real time or nearreal time, in the following way:

in the case of a created row, the query A-32-1, performed for all thepairs of columns, can be employed to determine whether anything haschanged in the groups of related columns. If so, the analysis is redone.As a variant, one can also decide directly to always redo the analysison a create row. In practice, this analysis will be simplifiable, sincethere is only one new row to be compared with all the others.

in the case of a deleted row, it is sufficient to search for whether itsdisappearance re-establishes the dependency.

lastly, a modify is equivalent to a delete followed by a create.

In practice, the foregoing will be simplifiable, since there is only onenew row to be compared with all the others.

Before the analysis phase, it may be useful to carry out a prior step ofprenormalizing the data of each column, or of certain at least of thecolumns (in particular of character type). This is because the user mayhave input two different values having identical semantics (for examplethe cities ‘Paris’ and ‘PARIS’ are semantically identical althoughdifferent from the point of view of their character string).

The present invention profoundly modifies the customary developmentcycle: the functional dependencies are deduced from the dependenciesbetween the data. This minimizes the analysis phase. This also makes itpossible to highlight dependencies which are not envisaged or notpresent originally. The present invention may therefore also be used asa tool for searching for correlations between the data and as a tool fordividing the data into domains.

The invention thus proposes a means making it possible in particular toreorganize a database, after a certain operating time. This means can beimplemented on the initiative of the user alone. It can be proposed atregular intervals, over time, or else as a function of the growth of thedatabase. Before such a proposal, or else in a general way, the analysispart can be implemented automatically, at least partially, so as todetermine whether there is actually anything to do.

It would also be conceivable for this analysis part to be kept up todate in the background. More generally, the analysis means and itsstorage means then operate continuously, dynamically (rather than fromtime to time, on the user's request, or on prompting by the developmenttool). Thus, the station (or one of them) can be furnished with ahandler (or “trigger”), which triggers the updating of the analysis assoon as an event occurs which may influence this analysis. To do this,it may be necessary to continuously track (directly or indirectly) allthe variables processed during the analysis phase.

It will be noted that, even if it indicates that a physicalreorganization is desirable, the analysis phase is not necessarilyfollowed by such a reorganization. The user can in fact refuse thereorganization for various reasons, such as the time which it will take,or the fact that he is not convinced of the utility of dividing histable into two parts, for example.

The present invention applies in general to any type of data, any typeof relation between the data, any type of database (not necessarilyrelational), any type of database access language, any type of databasearchitecture, any type of operating system, any type of storage mediumand of storage system.

The invention therefore to extend [sic] its effects to any filemanagement system accessible via a programming language which would makeit possible to write the detailed equivalent of the SQL commandsmentioned in the description. The power of the SQL query builderdescribed implies that the latter can generate (in particular) all therequisite statistical queries. With other languages, it may be necessaryto create a module (procedure) for each of these statistical queries. Insuchlike cases, the structure of FIG. 10 may be modified: thestatistical tool 520 would receive only data to be processed, whilst theinterrogation of the tables so as to access the columns would be donedirectly by the analysis module 530.

From another standpoint, with the statistical tool described, eachstatistical operation is conducted on all the rows, for one or twochosen columns. A variant of the SELECT COUNTs of Appendices A-31(possibly A-32) would consist in running through the values of therelevant column, while performing a detect new value: if the value hasalready been encountered, we go to the next; otherwise, this value isreferenced as being one of the values taken by the column, a counter isincremented and we go to the next value. At the end, the value of thecounter gives the number of distinct occurrences taken by the column.For COL1“xyz”COL2 we do the same thing by considering the concatenationof COL1, “xyz” and COL2 as a simple column (by considering as beforethat “xyz” is chosen so as not to create any possible confusion). Byprocessing this in a procedure, it would be conceivable tosimultaneously process several variables in a row (for example, toconsider COL1, COL2 and their concatenations such as COL1“xyz”COL2),then to subsequently sweep through all the rows, processing thesevariables selectively. All the variables (of the kind COL1, COL2 andCOL1“xyz”COL2) which can be defined from all the columns of the tablecan even be processed in one go.

According to another interesting variant, it is possible to use rankedsets or “lists”, in each of which, for each value of a data item of acolumn, the rows (designated for example by a number) which use thisvalue are cataloged.

This will be described with reference to two columns denoted COL1 andCOL2 (Appendix A-25). For each value of each column, the rows which usethis value are determined. The sets obtained for the two columns arecompared:

if they are strictly identical, the two columns are sisters.

if each set for a first column is equal to the a [sic] set for thesecond column, or else to the union of several sets for the secondcolumn, then the first column is the daughter of the second.

if there is no combination of sets for a column such that their uniongives a set identical to one of the sets for the other column, thecolumns are independent.

In Appendix A-25-1, it is observed that

E21=11 U E13 and that E22=E12 hence COL2 is the daughter of COL1.

In the example of Appendix A-25-2 (one row added), it will be observedthat there is no combination of E11, E12 and E13 whose union is E21;COL2 is therefore not the daughter of COL1. Likewise, there is nocombination of E21 and E22 whose union is E12; hence COL1 is not thedaughter of COL2. COL1 and COL2 are consequently independent.

An operation of comparing the unions and/or intersections of rankedlists (of a predefined format) is accessible to the person skilled inthe art. On that basis, the construction of a handler performing theabove functions for carrying out the analysis is likewise accessible.This handler is more efficient than the aforesaid SELECT commands, atleast for certain applications.

Likewise, storage in row form is not obligatory. It is for examplepossible to imagine, for each column, that a value be stored just oncephysically and that the list of row numbers (“indices”) where this valueappears be stored in parallel (or equivalent storage). A row is thenreconstructed dynamically by retrieving the various values of thecolumns of this row by searching for the row number in the listsassigned to each column.

Let the table consist of three columns COL1, COL2 and COL3 illustratedin A-26-1. For this table, the index sets (row numbers) can beconstructed for each value, as indicated in A-26-2. The set of rownumbers for rows with the value “X” will now be denoted X.

Relations A-26-3 show that COL1 determines COL2. In the same way, wededuce that COL2 determines COL3 and that COL1 determines COL3 (if onlyby transitivity).

But COL2 does not determine COL1 since

A∩a≠a.

It is desired to reorganize this table so as to culminate in thefollowing structure:

TAB1 (COL1, COL2)

TAB2 (COL2, COL3)

COL2 will become an external key of TAB1 and the primary key of TAB2.

To reorganize, another series of index sets is defined for the primarykey of the new table (COL2) and the index sets of the other columns ofthe new table are updated. The new structure is therefore that given inA-26-4, where the row numbers in the table TAB2 are distinguished by“bis”.

The join is made by establishing the correspondence between the twoindex sets for the pivot column (which is an external key of one tableand the external key of the other table). For example, if one seeks thevalue of COL3 corresponding to COL1=“C”, the pathway is as follows:

value C for COL1 index 4 of TAB1 value b for COL2index 2bis of TAB2value α for COL3.

Likewise, if one seeks to ascertain the values of COL1 for COL3=a, thepathway is as follows:

value α for COL3 index 1bis and 2bis of TAB2 values a and b for COL2indices 1, 2, 3 and 4 of TAB1 values A, B, C for COL3.

We now return to the starting table to show the manner of operation ofthe tracking of a dependency during the input/modification/deletion of arow. By adding a row, the table becomes that illustrated in A-27-1. Themembership index sets for each value become those given in A-27-2.

To check that the dependency relation between COL1 and COL2 has not beensevered, it suffices to check that the set of indices corresponding tothe newly input value of COL1 is always included (in the wide sense) inthe set of indices corresponding to the newly input value of COL2:

It is therefore necessary to check whether {1,2,6} {5,6}={1,2,6}.

This is not the case, hence the dependency has been severed.

The indices of the rows which block the dependency are obtained easilyby considering the empty intersections of the index sets which areneither empty nor equal to the set of indices of the near-daughtercolumn.

Thus, Appendix A-27-3 shows that the dependency is blocked by thefollowing set of two index sets:

{{1,2},{6 }}.

When deleting a row, the dependency is re-established only if the set ofsets of indices which blocks the dependency does not contain an element.

Thus, deleting row 1 gives {{2},{6}}. There is more than one set, thedependency is therefore not re-established. However, deleting row 2gives {{6}}. The dependency is re-established.

As already indicated, the modifying of a row can be treated for exampleas the stringing together of a delete and a create in any order.

Being geared around hardware means which carry out operations, theinvention could also be expressed in process form. It also culminates ina markedly improved development tool.

The simplest version of this development tool comprises the means makingit possible to carry out the following functions:

create an application with monotable structure.

create input forms as a function of the current structure.

generate interrogation queries and status reports adapted to the currentstructure.

To which is added, according to the invention:

as an option, the prenormalizing of the data (analysis).

the analysis of redundancies and interdependencies of the data(analysis), and

the physical reorganizing of the structure of the data as a function ofthe result of the analysis as well as the forms, status reports andreports (reorganization), or else the “virtual” reorganization describedearlier.

In a more advanced version, the said tool will comprise the means makingit possible to carry out the following operations, in order to fetch(import) an already existing file:

create a monotable structure making it possible to fetch the data of afile in tabular format.

determine for the file the column delimiter:

In each row, the number of occurrences of each character is counted. Thedelimiter belongs to the set of characters having the same, nonzero,frequency in each row of the file. If the set is empty, the file is notacceptable. If the set comprises more than one element, a choice ofdelimiter is requested of the user. Once the delimiter has beendetermined, possibly after having accepted signals from the user via theuser interface, each row is divided into an identical number of columns.

determine the format of each column of the file:

For simplicity, it is supposed that the data can only be in numerical,alphanumeric or date format: in order for a column to be in numericalformat, all the values of this column must be in numerical format. Inorder for a column to be in date format, all the values of this columnmust be in date format. If this is not the case, the column is inalphanumeric format. For the numerical format, the width and theprecision of the column are determined: the width is equal to the sum ofthe maximum number of digits coming before the decimal separator and ofthe maximum number of digits coming after the decimal separator. Theprecision is equal to the maximum number of digits coming after thedecimal separator. For the alphanumeric format, the width of the columnis determined. The width is equal to the maximum number of characters ofthe column.

create a table in the previously determined formats.

The SQL command is of the form given in A-35, where TAB1 is the name ofthe table to be created, COL1 . . . COLn the name of the columns of thetable and FORMAT1 . . . FORMATn the corresponding formats; FORMATi hasone of the following forms:

VARCHAR2 (L) for the alphanumeric format where L is the width of thecolumn. Certain databases do not support the VARCHAR2 format of theSQL92 standard, so the CHAR format is then used.

NUMBER(L, P) for the numerical format where L is the width of the columnand P its precision.

DATE for the date format.

The tool can also comprise the means making it possible to carry out thefollowing operations:

create forms making it possible to access the data contained in thetables, with the validation of the insertion, of the deletion or of themodification of a row being possible only if the following constraintsare complied with:

The value of a column must always tally with its format.

A primary key column must comprise unique values only.

An external key column must have a set of values included within the setof values of the corresponding primary key.

It may also comprise the means making it possible to createinterrogation queries where the user specifies simply the ordered listof simple columns or of calculated columns, the list of restrictiveconditions and the list of sorting commands.

This corresponds to the query builder, which it will be noted exhibitsintrinsic interest, independently of the use which is made of it here.This observation applies to other elements of the present invention.

In addition, the development tool advantageously comprises means forgenerating status reports, with:

a report generator for presenting the result of the interrogationqueries, where the user specifies simply the interrogation query, thedefinition of breaks, any group calculations, the format ofpresentations of the columns and of any group calculations, and/or

a graphics generator for presenting the result of the interrogationqueries where the user specifies simply the interrogation querycomprising from zero to several columns of label type and one or morecolumns of numerical type, the format of the column of label type(possibly), the format of the graphics.

In the foregoing, a development tool has been defined which is intendedto work with a pre-existing database management system. An interestingvariant, in particular with the management lists (Appendices A-25 etseq., and their description), consists of an integrated system whichcomprises the management of tables and the analysis and/orreorganization tools described.

In all the embodiments described or envisaged, it is possible to replaceor supplement the physical reorganization of the tables with thevirtually reorganized database structure mode, for input into forms.

Instead of operating exhaustively, the analysis module could stop as afunction of chosen criteria, including the fact that it has alreadyfound a group of related columns (with no false dependencies). It wouldalso be possible to string together directly with regard to therestructuring, without selection by the user, as soon as the analysisgives a group of related columns. The relationship key could then bechosen, from among the candidate keys, as:

the shortest column of this group, or

a numerical column, or

a new key could be created systematically.

With appropriate precautions, it would also be possible to automate theforcing of dependency, at least in flagrant cases, or else to make acorresponding default proposal to the user.

Lastly, although it is currently preferred that the processing andstorage of the data be close together, it is possible to envisage remotestorage, for example in the manner of network computers.

APPENDIX I

A-11—Calculated Columns

A calculated column is a formula consisting:

of simple columns or of calculated columns.

of operators and of mathematical functions.

of set operators (sum, average, count for example).

Examples

PRICE_ET*1.206 where * is the multiplication symbol and PRICE_ET acolumn of a table corresponding to a price excluding tax.

PRICE_ET*VAT_RATE where VAT_RATE is a column of a table corresponding toa VAT rate.

Count(NO_INVOICE) where count( ) is an enumeration operator, andNO_INVOICE is the primary key column of a table comprising data relatingto invoices.

Sum(UNIT_PRICE*QUANTITY_ORDERED) where Sum( ) is a summation operator,UNIT_PRICE is the unit price column of a table of articles andQUANTITY_ORDERED, the quantity ordered column of a table of orders.

A-12—Restrictive Conditions

A restrictive condition is composed:

of a simple column or of a calculated column.

of a comparison operator.

possibly, of a value or of a set of comparison values.

Examples

CITY=“Paris” The column corresponding to a city must be equal to thevalue “Paris”.

CITY from among(“Paris”, “London”, “Washington”) where among( ) is amembership operator

PRICE_ET*VAT_RATE<1000

Count(INVOICE)>3.

A-13—Queries—principle

For its part, an interrogation query consists of:

an ordered list of simple columns or of calculated columns.

a list of restrictive conditions.

a list of sorting commands.

The list of simple and calculated columns satisfies the followingconstraints: it does not contain any external key either directly (caseof a simple column) or indirectly (in the definition of a calculatedcolumn).

The restrictive conditions are related by the ‘AND’ and ‘OR’ Booleanoperators, as well as possibly the negation operator denoted ‘NOT’.

Examples

The two examples below are based on two tables “STUDENTS” (n rows) and“UNIVERSITIES” (p rows). They are related by a join with regard to anidentifier of the relevant university concerned, UNIVERSITYCODE, whichis external key in “STUDENTS” and primary key in “UNIVERSITIES”.

A-13-1

SELECT * FROM STUDENTS, UNIVERSITIES returns a Cartesian product of thetwo tables (n times p rows), which can be regarded as a new tablereviewing all the universities for each student.

A-13-2

SELECT * FROM STUDENTS, UNIVERSITIES WHERESTUDENTS.UNIVERSITYCODE=UNIVERSITY.UNIVERSITY CODE returns n rows only(1 per student). The SQL engine has created a loop in order to arrive atthis result: we loop around the STUDENTS table and for each student wesearch for the information of his/her university by virtue of theUNIVERSITYCODE.

APPENDIX II

A-21—Meta-dictionary

The meta-dictionary (of the development tool), which is stored in thedatabase, contains, as a minimum:

the information relating to the data structure:

the names of the tables.

the names of the columns of each table.

the type of each column (character, numerical, date for example).

for each table, the indication of the column which is the primary key.

for each table, the columns which are external keys (possibly none).

the relationships between the tables together with their cardinalityrelationship (sense of the master/detail relationship).

the definitions of the forms:

the reference of the table (or of the tables) to be modified by virtueof the form.

the references of the columns belonging to other tables related to thetable.

the definitions of the reports:

the reference of the columns included in the report.

any calculations to be performed on these columns.

the restrictive conditions on certain columns.

the display formats of each column.

other parameters (sort commands, breaks, title of the report, etc.).

the definitions of the graphics:

the reference of the columns included in the graphic.

any calculations to be performed on these columns.

the restrictive conditions on certain columns.

the type of graphic (histogram for example).

other parameters (title of the graphic, etc.)

A-22—Query: Canonical Form

It is:

(Row No.) SELECT Col₁, Col₂, . . . Col_(n) (1) FROM TAB₁, TAB₂, . . .TAB_(p) (2) TAB2₁, TAB2₂, . . . TAB2_(q) (3) WHERE COL_(a) = COL_(b) ANDCOL_(c) = COL_(d) . . . , AND (4) COL_(r) = COL_(s) AND COND_NOT_ENS₁AND COND_NOT_(—) (5) ENS₂ . . . COND_NOT_ENS_(t) GROUP BY COL_NOT_ENS₁,. . . COL_NOT_ENS_(u) (6) HAVING COND_ENS₁, . . . COND_ENS_(v) (7) ORDERBY ORD₁, . . . ORD_(w). (8)

This SQL SELECT command is decomposed over several lines (here, apartfrom exceptions which are mentioned or obvious, indented lines form inprinciple part of the same SQL command as the previous line).

The first line of the query indicates as “Col₁, Col₂, . . . Col_(n)” thelist of columns (simple or calculated) which will be extracted by thequery.

The second line determines as {TAB₁, TAB₂, . . . TAB_(p)} the set oftables to which the simple columns and the components of the calculatedcolumns belong, firstly for the list of columns, but also for the restof the SQL command, essentially the restrictive conditions.

It should be observed that we are dealing with a set: a table appearstherein once only even though several columns of this table form part ofthe list of columns (for example)

In the third line come the restrictive conditions whose columns (simpleor calculated) do not comprise, directly or indirectly, any setfunctions (sum, average, etc.); these conditions COND_(i) are of theform:

COL_(i) OP_(i) VAL_(i)

where COL_(i) is a column (simple or calculated),

OP_(i) is an SQL language operator (for example = equality, !=difference, < less than, IN among, IS NULL is not filled in).

VAL_(i) a value, a set of values or nothing according to the choice ofthe operator OP_(i).

The conditions are related by the Boolean operators.

They can be bracketed by parentheses so as to resolve any ambiguityregarding the priority of the operators: for example COND₁ AND (COND₂OR—COND₃). By virtue of knowing the structure of the tables, the joinsbetween the tables are constructed: COL_(a)=COL_(b) where COL_(a) is theprimary key of one table and COL_(b) the corresponding external key ofanother table, so that the set of joins constitutes the minimum butcomplete path passing through all the tables of the set {TAB₁, TAB₂, . .. TAB_(p)}. Other tables {TAB2 ₁, TAB2 ₂, . . . TAB2 _(q)} than thosebelonging to {TAB₁, TAB₂, . . . TAB_(p)} may be involved in building thepath. Several columns of the same table may be involved in the joinsconstituting the path. Such a path always exists.

The fourth line of the query relates to the restrictive conditions forthe columns (simple or calculated) which do not contain, directly orindirectly, any set function; it is of the form:

COND_NOT_ENS₁ AND COND_NOT_ENS₂ . . . COND_NOT_ENS_(t)

The third and fourth lines (if one at least exists) are preceded by thekeyword ‘WHERE’ and are related, if the two exist, by the keyword ‘AND’.

The fifth line of the query relates to the columns (simple orcalculated) which do not contain, directly or indirectly, any setfunction; let COL_NOT_ENS₁, . . . COL_NOT_ENS_(u) be these columns.

Thereafter, in the sixth line come the restrictive conditions on thecalculated columns which all contain at least one set function and aredenoted COL_(ENS) _(i).

Lastly, in the seventh line come the sort commands denoted ORD_(i). Asort command comprises a column (simple or calculated) and a directionASC (ascending) or DESC (descending).

Only parts 1 and 2 are always present.

The third part is present if two tables (at least) are involved in thelist of columns or of conditions.

The remainder is optional.

A-23—Examples of Canonical Queries

A-23-1

SELECT P_SN, P_NAME FROM PRODUCTS

Gives the list of columns {product serial number, product name}. Thecolumns P_SN and P_NAME belong to the same PRODUCTS table. Hence, only asingle table is involved and hence there is no third part (see A-22,above).

There is no restrictive condition on calculated columns which do notcomprise any set function, nor are there any calculated columns whichcomprise set functions, nor is there any restrictive condition oncalculated columns comprising set functions, and lastly, no sort.

A-23-2

SELECT P_SN, P_NAME FROM PRODUCTS

WHERE P_SN<>‘US1’

ORDER BY P_NAME ASC

List of columns {product serial number, product name}. Restrictivecondition: the product serial number is different from ‘US1’

Sort command: product name, ascending.

A-23-3

SELECT PRODUCTS.P_SN, P_NAME, SUM(COST)

FROM PRODUCTS, COMPONENTS

WHERE PRODUCTS.P_SN=COMPONENTS.P_SN

AND PRODUCTS. P_SN<>‘US1’

GROUP BY PRODUCTS.P_SN, P_NAME

HAVING SUM(COST)>1000

ORDER BY P_NAME ASC

List of columns {product serial number, product name, Sum(Cost)}

Restrictive conditions:

The product serial number is different from ‘US1’

Sum(Cost)>1000

Sort commands: P_NAME ascending.

The Product serial number and Product name columns belong to the samePRODUCTS table. The Cost column belongs to the COMPONENTS table. Hence,two tables are involved and the third part of the query is‘PRODUCTS.P_SN=COMPONENTS.P_SN’.

When there is an ambiguity regarding the name of a column, SQL requeststhat the ambiguity be resolved by specifying in front of the name ofcolumns which could lead to confusion the name of the table to whichthey refer (followed by a prescribed separator, in SQL a fullstop).

There is a calculated column which comprises a set function (sum) andhence the fifth part of the query is: ‘GROUP BY PRODUCTS.P_SN, P_NAME’.

There is a restrictive condition on a calculated column comprising a setfunction and hence the sixth part of the query is:

‘HAVING SUM(COST)>1000’.

A-24—“Near Dependency”

Consider a table with two columns and seven rows:

COL1 COL2 A 1 B 2 A 1 C 2 D 3 D 4 D 4

The row D3 or the rows D4 block dependency, since D cannotsimultaneously determine 3 and 4, if there is dependency.

A-25—Dependency by Lists

A-25-1—Example 1

COL1 COL2 A a B b C a A a

for COL1 the sets are:

for “A”, E11={row 1, row 4},

for “B”, E12={row 2},

for “C”, E13={row 3}.

for COL2 the sets are:

for “a”, E21={row 1, row 3, row 4},

for “b”, E22={row 2}

These sets may be manipulated for example as variables of an array(ARRAY) with two dimensions, the first for the relevant value, and thesecond for the list of rows, a character string with no space, where therows are designated by their straightforward number, are separated by acomma, and are sorted in ascending order. If E1 and E2 are the names ofthese arrays for COL1 and COL2, we shall write for example:

E1[1,1]=“A” and E1[1,2]=“1,4”

E1[2,1]=“B” and E1[2,2]=“2”

E1[3,1]=“C” and E1[3,2]=“3”

E2[1,1]=“a” and E2[1,2]=“1,3,4”

E2[2,1]=“b” and E2[2,2]=“2”

An even simpler notation (not borrowed from computing) will consist herein designating each set by the value which it relates to, i.e.:

A=“1, 4”

B=“2”

C=“3”

a=“1,3,4”

b=“2”

A-25-2—Example 2

COL1 COL2 A a B b C a A a C b

For COL1 the sets are now;

E11={row 1, row 4},

E12={row 2},

E13={row 3, row 5}.

for COL2 the sets are now:

E21={row 1, row 3, row 4},

E22={row 2, row 5}

A-26

A-26-1

COL1 COL2 COL3 Row 1 A a α Row 3 A a α Row 3 B a α Row 4 C b α Row 5 D cβ

A-26-2

A: {1,2} a: {1,2,3} α: {1,2,3,4} B: {3} b: {4} β: {5} C: {4} c: {5} D:{5}

A-26-3

A∩a=A B∩a=B C∩a=Ø D∩a=Ø

A∩b=Ø B∩b=Ø C∩b=C D∩b=Ø

A∩c=Ø B∩c=Ø C∩c=Ø D∩c=D

A-26-4

TAB1 TAB2 A: {1,2} a: {1,2,3} {1bis} α: {1bis,2bis} B: {3} b: {4} {2bis}β: {3bis} C: {4} c: {5} {3bis} D: {5}

A-27

A-27-1

COL1 COL2 COL3 Row 1 A a α Row 3 A a α Row 3 B a α Row 4 C b α Row 5 D cβ Row 6 A c β

A-27-2

A: {1,2,6} a: {1,2,3} α: {1,2,3,4} B: {3} b: {4} β: {5} C: {4} c: {5,6}D: {5}

A-27-3

A ∩ a = {1,2} ≠ A B ∩ a = B C ∩ a = Ø D ∩ a = Ø A ∩ b = Ø B ∩ b = Ø C ∩b = C D ∩ b = Ø A ∩ c = {6} ≠ A B ∩ c = Ø C ∩ c = Ø D ∩ c = D

APPENDIX III

A-31

A-31-1

SELECT COUNT (*) FROM TAB1

A-31-2

SELECT COUNT (DISTINCT COL1) FROM TAB 1

A-31-3

SELECT COUNT (DISTINCT COL1‘xyz’COL2) FROM TAB1

A-31-4

SELECT COUNT (DISTINCT COL1) FROM TAB1 WHERE COL1 LIKE “%xyz”

A-32

A-32-1

SELECT COUNT (DISTINCT a.COL1‘xyz’a.COL2) FROM TAB1 a

WHERE a.COL1 IN

(SELECT b.COL1 FROM TAB1 b GROUP BY b.COL1

HAVING COUNT(DISTINCT b.COL1‘xyz’b.COL2)>1)

OR COL1 IS NULL′

A-32-2

SELECT a.COL1, a.COL2, COUNT(a.COL1‘xyz’a.COL2) FROM TAB1 a

WHERE a.COL1 IN

(SELECT b.COL1 FROM TAB1 b

GROUP BY b.COL1

HAVING COUNT(DISTINCT b.COL1‘xyz’b.COL2)>1)

OR a.COL1 IS NULL

GROUP BY a.COL1, a.COL2

ORDER BY a.COL1, COUNT (a.COL1‘xyz’a.COL2)DESC

A-32-3

UPDATE TAB1 SET COL2=‘VALUE2’ WHERE COL2 IN (‘VALUE2_(—)1’,‘VALUE2_(—)2’, . . . ‘VALUE2_(n)’) AND COL1=‘VALUE1’

A-32-4

UPDATE TAB1 SET COL1=‘VALUE1’ WHERE COL2 IN (‘VALUE2_(—)1’,‘VALUE2_(—)2’, . . . ‘VALUE 2_n’) AND COL1 IS NULL

A-32-5

UPDATE TAB1 SET COL1=‘VALUEIA’ WHERE COL2 IN (‘VALUE2_(—)1’,‘VALUE_(—)2’, . . . ‘VALUE2_n’) AND COL1=‘VALUE1’

A-33

A-33-1

CREATE TABLE TAB2 AS SELECT DISTINCT S1, S2, . . . Sn FROM TAB1

A-33-2

CREATE TABLE TEMPO AS SELECT R1, R2, . . . Rp, S1 FROM TAB 1

DROP TABLE TAB1

RENAME TEMP TO TAB1

A-34

A-34-1

CREATE VIEW TEMPO AS SELECT DISTINCT S1, S2, . . . Sn FROM TAB1

CREATE TABLE TAB2

AS SELECT S1, S2, . . . Sn, ROWNUM CODE_TAB2 FROM TEMPO

DROP VIEW TEMPO

A-34-2

CREATE TABLE TEMPO R1, R2, . . . Rp, CODE_TAB2

AS SELECT R1, R2, . . . Rp, CODE_TAB2 FROM TAB1, TAB2

WHERE TAB1.S1=TAB2.S1

AND TAB1.S2=TAB2.S2

AND TAB1.Sn=TAB2.Sn

DROP TABLE TAB1

RENAME TEMPO TO TAB1

A-35

CREATE TABLE TAB1 (COL1 FORMAT1 COL2 FORMAT2 . . . COLn FORMATn)

A-36

SELECT [DISTINCT] COL2 FROM TAB1 WHERE COL1=“US1”

A-37

Select [distinct] COL1 from TAB2

What is claimed is:
 1. An improved processing device of the type thatincludes at least one computer (100, 200), furnished with a central unit(410) with a processor, at least one user peripheral, and a memory(440), which are run by an operating system (450), and a databasemanagement system (470), stored in the at least one computer, and ableto cooperate with the operating system so as to allow a user to performat least one of creation, input, and use of a database that comprises atleast one data table (475) which can be broken down into rows andcolumns, wherein the improvement comprises: a means forming anautonomous meta-dictionary (510), for dynamically storing, withreference to the database, metadata information relating to thestructure of each table of the database, and to the relationshipsbetween tables, an analysis means (520, 530, 550) for determining andstoring at least temporarily a representation of groups of interrelatedcolumns, and comprising a statistical tool (520) able to receive atleast two ranked data sets presented to it, so a to determinerelationships between these data sets, by enumerating distinctoccurrences, the analysis means using at least the statistical tool fordetermining the representation of groups of interrelated columns, and arestructuring module (580, 590) capable of cooperating with the analysismeans and the meta-dictionary with a view to establishing a modifiedversion of the metadata information which takes account of at least onegroup of thus related columns, and to establishing for the user, atleast in a create/input mode, a presentation or a view of the databasewhich takes account of the modified version of the metadata information.2. A device according to claim 1, wherein the analysis means furthercomprises: an analysis module (530) capable of cooperating with themeta-dictionary so as to present the data of at least two differentcolumns of a table to the statistical tool, so as to determine anyrelationship between them, with means (550) for storing at leasttemporarily a representation of groups of related columns, the analysismodule (530) being devised so as to repeat the presentation of pairs ofcolumns, until at least one is found whose two columns are related, oruntil the possibilities are exhausted in the database.
 3. A deviceaccording to claim 2, wherein the analysis module (530) systematicallycarries out the presentation of all the possible pairs of differentcolumns for the said table.
 4. A device according to claim 2, whereinthe analysis module (530) operates for all the tables of the database.5. A device according to claim 2, wherein the statistical tool (520) isable to define: a means (1024) for counting the number of distinctvalues (N_(i), N_(j)) of each of the at least two data sets, a means(1034) for counting the number of distinct values (P) of the pairsformed by the data of like rank of the at least two data sets, and ameans (1042-1049) for returning an information item to at least twocharts, and which is representative of a comparison between the countsN_(i), N_(j) and P.
 6. A device according to claim 5, wherein thestatistical tool (520) is able also to define a means (1012) forcounting a total number of distinct values (N) over a relevant group ofcolumns, and wherein the analysis module (530) stores (550) the columnsfor which N=N_(i), in the guise of candidate keys of a relationshipstarting from a current table.
 7. A device according to claim 5, whereinthe statistical tool (520) is devised so as to return different charts(1043, 1045, 1047, 1049), relating to comparisons pertaining to N_(i),N_(j) and P, and wherein the analysis module (530), when it presents thedata of a pair of columns (COL1 and COL2) of a table to the statisticaltool, is devised so as to react to the said charts, by storing (550)that: if N_(i)=P and N_(j)=P then COL1 and COL2 mutually determine oneanother (interdependency), if N_(i)=N_(j) and N_(i)>P then COL1determines COL2, if N_(i)>N_(j) and N_(i)=P then COL2 determines COL1,and otherwise COL1 and COL2 have no dependency relation.
 8. A deviceaccording to claim 7, wherein the restructuring module comprises arestructuring means (580), associated with a user interface module(570), for allowing the user to select a starting table to be processed,columns to be processed from among at least one group of relatedcolumns, and a primary key of a new table to be chosen from amongcolumns that are interdependent and a new key created for this purpose.9. A device according to claim 8, wherein the restructuring means (580)are advised so as to exclude the primary key of the starting table fromthe said selections.
 10. A device according to claim 8, wherein therestructuring means (580) are able to construct a new table with thedata of a group of related columns, as well as a relationship key withthe relevant table.
 11. A device according to claim 10, wherein therelationship key is a column which remains in the relevant table, whichis reduced so as to preserve this relationship key only, from among thesaid related columns.
 12. A device according to claim 10, wherein therelationship key is a column created both in the new table and in therelevant table, which is reduced so as not to preserve any of the saidrelated columns.
 13. A device according to claim 8, wherein therestructuring means (580) are devised so as to operate the constructionof a new table if a group of at least two related columns (COL1, COL2)has been identified by the analysis module.
 14. A device according toclaim 5, wherein the statistical tool (520) is also able to provide ameans (1210) for counting the total number of pairs of distinct valuesof two data sets which block the dependency relation between these twosets, and, as appropriate, a means (1230) for counting the number ofoccurrences of each of these “blocking” pairs of distinct values, andfor identifying them, whilst the analysis means (530) is devised so asto propose or perform (1250) modifications suitable for forcingdependency between the columns.
 15. A device according to claim 1,wherein, in a data input mode, active means are provided forrestricting, by default, modify access to a part only of the columns ofa group of related columns, the others being simply read accessible. 16.A device according to claim 1, wherein, in a data input mode, activemeans are provided for compelling, by default, the data present inrelated columns of a row to remain consistent with at least partiallyidentical data already existing in other rows.
 17. A device according toclaim 1, wherein the analysis means comprises a storage means thatoperates continuously, dynamically.
 18. A device according to claim 1,wherein the at least one computer includes a server and at least one a“client”, which are networked.
 19. A device according to claim 1,wherein the database management system comprises a relational databasemanagement engine, drivable in the